library(topGO)
library(knitr) # to use kable()
library(limma) # to use vennDiagram()
library(ggplot2)
library(variancePartition)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
isoforms = '../2019-07-26/transcripts/annotation.txt')
DGEDIR <- '../2020-01-08/'
ENRICHMENT <- paste('../2020-01-14/Enrichment', TAG, VAR, 'RData', sep='.')
This is the enrichment analysis for genes regulated by regime. I am using the topGO package (Alexa and Rahnenfuhrer 2019), which is able to apply different algorithms. The starting point is an ordering of genes usually according to their association with a phenotype or any other relevant quantity, such as a measure of differential expression between two conditions. In this case, we order genes by their differential expression between regime levels. Following one of the suggestions in the manual, I originally ordered genes by either \(p\) value of the differential expression analysis, or by the (complement of the) amount of its expression variance explained by regime (see 2020-01-14). Those were quite equivalent orderings. However, they do not distinguish the sign of the fold change: both up- and down-regulated genes contribute to make a gene ontology term significant. It is one of the approaches recommended, and it makes sense, to the extent that among all the genes annotated with a function, some may display contrasting expression patterns, for example if both positive and negative regulators of that function are annotated with the same label. I thus, expect the use of \(p\) values in enrichment analysis to facilitate the detection of high level categories.
In contrast, genes sharing lower level categories (very specific GO terms) are more likely to be expressed in a similar way. Then, an ordering of genes that reflect the sign of the difference in expression levels between the two conditions compared would be more informative. In principle, enrichment methods should be able to detect an enrichment in either end of the list, rather than only in the top. However, I noticed that the Kolmogorov-Smirnov test implemented in topGO only looks at the top of the list. Even if the score can be negative, it treats it as if it always was a p value, and orders genes from lowest to highest. Even if the function that selects for significant genes uses the absolute value of the score, the Kolmogorov-Smirnov test does not use any threshold, and ignores that detail.
In any case, I don’t want to use a different statistic based on a pre-selected subset of significant genes. In addition to that being an arbitrary decision, the ordering of genes would become folded again. I just need to run the enrichment analysis with the KS statistic twice: once with the t statistic, and once with its reverse, \(-t\). The first test will find GO terms overrepresented among the genes that are overexpressed in the regular, relative to the random environment. The second test will find GO terms overrepresented among genes overexpressed in the random environment.
Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.
load(paste0(DGEDIR, TAG, '.RData'))
tagTStat <- topTable(fitmm, coef = VAR, num = length(fitmm$F), sort.by='t', resort.by='t')[,'t',drop=FALSE]
Annotation <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))
To initialize the topGOdata object, I need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, the \(t\) statistics from the differential expression analysis. The structure() function adds attributes to an object.
TStats <- structure(tagTStat[,1], names = row.names(tagTStat))
TMinus <- -TStats
rm(tagTStat)
There are 18598 genes scored with a \(t\) statistic. The Annotation data frame has more than one GO term for every tag, separated by |. I need a named list.
head(Annotation)
## tagname goterms
## 1 XLOC_000002 GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009 GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021 GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036 GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(Annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- Annotation$tagname
rm(Annotation)
There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.
for (tag in names(TStats)) {
if (! tag %in% names(allgenes2GO)) {
allgenes2GO <- append(allgenes2GO,
structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
}
}
Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.
I generate three datasets, to analyse each of the three ontologies. These are the three topGO objects for the regular regime.
# with 'ks' statistics, the selection function only affects the display of
# presumed numbers of 'significant' genes, I think.
selection <- function(allScores) {return(allScores < -5.5)}
GOdata.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = TStats,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = TStats,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = TStats,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
| ontology | Num_Genes | Num_GO_terms |
|---|---|---|
| BP | 15162 | 1154 |
| MF | 17429 | 576 |
| CC | 12940 | 291 |
And the following three, for the random regime.
GOminus.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = TMinus,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOminus.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = TMinus,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOminus.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = TMinus,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.
I start testing for enrichment of GO terms among genes overexpressed in the regular environment.
BP.reg.elim <- runTest(GOdata.BP, algorithm = "elim", statistic = "ks")
BP.reg.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.reg.lea <- runTest(GOdata.BP, algorithm = "lea", statistic = "ks")
MF.reg.elim <- runTest(GOdata.MF, algorithm = "elim", statistic = "ks")
MF.reg.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.reg.lea <- runTest(GOdata.MF, algorithm = "lea", statistic = "ks")
CC.reg.elim <- runTest(GOdata.CC, algorithm = "elim", statistic = "ks")
CC.reg.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.reg.lea <- runTest(GOdata.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BP.reg.elim, BP.reg.weight01, BP.reg.lea,
MF.reg.elim, MF.reg.weight01, MF.reg.lea,
CC.reg.elim, CC.reg.weight01, CC.reg.lea), function(X) length(score(X))),
Significant = sapply(list(BP.reg.elim, BP.reg.weight01, BP.reg.lea,
MF.reg.elim, MF.reg.weight01, MF.reg.lea,
CC.reg.elim, CC.reg.weight01, CC.reg.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1154 | 31 |
| BP | weight01 | 1154 | 32 |
| BP | lea | 1154 | 46 |
| MF | elim | 576 | 47 |
| MF | weight01 | 576 | 42 |
| MF | lea | 576 | 68 |
| CC | elim | 291 | 17 |
| CC | weight01 | 291 | 17 |
| CC | lea | 291 | 30 |
rm(ResultsSummary)
And now in the random environment:
BP.ran.elim <- runTest(GOminus.BP, algorithm = "elim", statistic = "ks")
BP.ran.weight01 <- runTest(GOminus.BP, algorithm = "weight01", statistic = "ks")
BP.ran.lea <- runTest(GOminus.BP, algorithm = "lea", statistic = "ks")
MF.ran.elim <- runTest(GOminus.MF, algorithm = "elim", statistic = "ks")
MF.ran.weight01 <- runTest(GOminus.MF, algorithm = "weight01", statistic = "ks")
MF.ran.lea <- runTest(GOminus.MF, algorithm = "lea", statistic = "ks")
CC.ran.elim <- runTest(GOminus.CC, algorithm = "elim", statistic = "ks")
CC.ran.weight01 <- runTest(GOminus.CC, algorithm = "weight01", statistic = "ks")
CC.ran.lea <- runTest(GOminus.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BP.ran.elim, BP.ran.weight01, BP.ran.lea,
MF.ran.elim, MF.ran.weight01, MF.ran.lea,
CC.ran.elim, CC.ran.weight01, CC.ran.lea), function(X) length(score(X))),
Significant = sapply(list(BP.ran.elim, BP.ran.weight01, BP.ran.lea,
MF.ran.elim, MF.ran.weight01, MF.ran.lea,
CC.ran.elim, CC.ran.weight01, CC.ran.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1154 | 76 |
| BP | weight01 | 1154 | 53 |
| BP | lea | 1154 | 151 |
| MF | elim | 576 | 32 |
| MF | weight01 | 576 | 39 |
| MF | lea | 576 | 51 |
| CC | elim | 291 | 43 |
| CC | weight01 | 291 | 27 |
| CC | lea | 291 | 69 |
rm(ResultsSummary)
orderedTerms <- names(sort(score(BP.reg.weight01)))
significant.weight01 <- score(BP.reg.weight01)[orderedTerms] <= 0.005
significant.lea <- score(BP.reg.lea)[orderedTerms] <= 0.005
significant.elim <- score(BP.reg.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.reg.sigTerms <- sigTerms
BP.all <- GenTable(GOdata.BP, elim=BP.reg.elim, weight01=BP.reg.weight01, lea=BP.reg.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BP.all[BP.all$GO.ID %in% sigTerms,],
caption = "Most overrepresented Biological Process terms in predictable environments")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0007186 | G protein-coupled receptor signaling pat… | 206 | 1 | 0.76 | 1 | 1.3e-26 | 4.4e-27 | 1.3e-26 |
| 2 | GO:0006508 | proteolysis | 442 | 4 | 1.63 | 3 | 8.1e-10 | 8.4e-19 | 8.1e-10 |
| 3 | GO:0006814 | sodium ion transport | 88 | 0 | 0.33 | 2 | 5.8e-10 | 5.8e-10 | 5.8e-10 |
| 4 | GO:0006030 | chitin metabolic process | 74 | 3 | 0.27 | 4 | 1.2e-09 | 3.0e-09 | 1.2e-09 |
| 5 | GO:0006813 | potassium ion transport | 56 | 0 | 0.21 | 5 | 1.4e-07 | 1.4e-07 | 8.8e-10 |
| 6 | GO:0034220 | ion transmembrane transport | 147 | 0 | 0.54 | 10 | 2.6e-05 | 1.9e-07 | 2.2e-08 |
| 7 | GO:0055085 | transmembrane transport | 500 | 2 | 1.85 | 7 | 1.2e-06 | 4.2e-07 | 6.9e-13 |
| 8 | GO:0006486 | protein glycosylation | 75 | 0 | 0.28 | 11 | 0.00011 | 6.6e-07 | 0.00011 |
| 9 | GO:0005975 | carbohydrate metabolic process | 202 | 6 | 0.75 | 18 | 0.00121 | 3.3e-06 | 0.00024 |
| 10 | GO:0060271 | cilium assembly | 46 | 0 | 0.17 | 6 | 9.5e-07 | 7.6e-06 | 1.4e-08 |
| 11 | GO:0015074 | DNA integration | 34 | 0 | 0.13 | 8 | 1.5e-05 | 1.5e-05 | 1.5e-05 |
| 12 | GO:0006811 | ion transport | 436 | 0 | 1.61 | 12 | 0.00014 | 2.6e-05 | 7.0e-19 |
| 14 | GO:0007156 | homophilic cell adhesion via plasma memb… | 21 | 0 | 0.08 | 13 | 0.00015 | 0.00015 | 0.00015 |
| 15 | GO:0007160 | cell-matrix adhesion | 16 | 0 | 0.06 | 14 | 0.00020 | 0.00020 | 0.00020 |
| 16 | GO:0003341 | cilium movement | 10 | 0 | 0.04 | 15 | 0.00026 | 0.00026 | 0.00026 |
| 18 | GO:0070588 | calcium ion transmembrane transport | 30 | 0 | 0.11 | 16 | 0.00101 | 0.00101 | 0.00101 |
| 19 | GO:0005992 | trehalose biosynthetic process | 6 | 2 | 0.02 | 17 | 0.00110 | 0.00110 | 0.00110 |
| 20 | GO:0007154 | cell communication | 565 | 1 | 2.09 | 9 | 1.9e-05 | 0.00132 | 0.00016 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term = Term(GOTERM[sigTerms]),
Definition = Definition(GOTERM[sigTerms]),
PValue=score(BP.reg.weight01)[sigTerms]),
caption = 'BP terms overrepresented in predictable environments')
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0007186 | G protein-coupled receptor signaling pathway | A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). | 0.0000000 |
| GO:0006508 | proteolysis | The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. | 0.0000000 |
| GO:0006814 | sodium ion transport | The directed movement of sodium ions (Na+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000000 |
| GO:0006030 | chitin metabolic process | The chemical reactions and pathways involving chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. | 0.0000000 |
| GO:0006813 | potassium ion transport | The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000001 |
| GO:0034220 | ion transmembrane transport | A process in which an ion is transported across a membrane. | 0.0000002 |
| GO:0055085 | transmembrane transport | The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. | 0.0000004 |
| GO:0006486 | protein glycosylation | A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. | 0.0000007 |
| GO:0005975 | carbohydrate metabolic process | The chemical reactions and pathways involving carbohydrates, any of a group of organic compounds based of the general formula Cx(H2O)y. Includes the formation of carbohydrate derivatives by the addition of a carbohydrate residue to another molecule. | 0.0000033 |
| GO:0060271 | cilium assembly | The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. | 0.0000076 |
| GO:0015074 | DNA integration | The process in which a segment of DNA is incorporated into another, usually larger, DNA molecule such as a chromosome. | 0.0000147 |
| GO:0006811 | ion transport | The directed movement of charged atoms or small charged molecules into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000263 |
| GO:0007156 | homophilic cell adhesion via plasma membrane adhesion molecules | The attachment of a plasma membrane adhesion molecule in one cell to an identical molecule in an adjacent cell. | 0.0001501 |
| GO:0007160 | cell-matrix adhesion | The binding of a cell to the extracellular matrix via adhesion molecules. | 0.0001997 |
| GO:0003341 | cilium movement | The directed, self-propelled movement of a cilium. | 0.0002568 |
| GO:0070588 | calcium ion transmembrane transport | A process in which a calcium ion is transported from one side of a membrane to the other by means of some agent such as a transporter or pore. | 0.0010143 |
| GO:0005992 | trehalose biosynthetic process | The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. | 0.0010957 |
| GO:0007154 | cell communication | Any process that mediates interactions between a cell and its surroundings. Encompasses interactions such as signaling or attachment between one cell and another cell, between a cell and an extracellular matrix, or between a cell and any other aspect of its environment. | 0.0013231 |
| GO:0007018 | microtubule-based movement | A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. | 0.0036411 |
I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.
showSigOfNodes(GOdata.BP, score(BP.reg.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 130
## Number of Edges = 236
##
## $complete.dag
## [1] "A graph with 130 nodes."
This is just a example of the most significant GO term:
showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MF.reg.weight01)))
significant.weight01 <- score(MF.reg.weight01)[orderedTerms] <= 0.005
significant.lea <- score(MF.reg.lea)[orderedTerms] <= 0.005
significant.elim <- score(MF.reg.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.reg.sigTerms <- sigTerms
MF.all <- GenTable(GOdata.MF, elim=MF.reg.elim, weight01=MF.reg.weight01, lea=MF.reg.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MF.all[MF.all$GO.ID %in% sigTerms,],
caption = "Most overrepresented Molecular Function terms in predictable environments.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0005509 | calcium ion binding | 361 | 0 | 1.43 | 1 | < 1e-30 | < 1e-30 | < 1e-30 |
| 2 | GO:0004930 | G protein-coupled receptor activity | 180 | 1 | 0.71 | 2 | 5.7e-23 | 9.3e-22 | 1.3e-26 |
| 3 | GO:0043565 | sequence-specific DNA binding | 146 | 0 | 0.58 | 4 | 2.0e-09 | 1.4e-11 | 2.0e-09 |
| 4 | GO:0008061 | chitin binding | 77 | 2 | 0.30 | 3 | 1.1e-09 | 1.1e-09 | 1.1e-09 |
| 5 | GO:0005272 | sodium channel activity | 73 | 0 | 0.29 | 5 | 7.0e-09 | 7.0e-09 | 7.0e-09 |
| 6 | GO:0004252 | serine-type endopeptidase activity | 88 | 2 | 0.35 | 6 | 2.3e-08 | 2.3e-08 | 2.3e-08 |
| 7 | GO:0003774 | motor activity | 131 | 0 | 0.52 | 7 | 1.3e-07 | 1.3e-07 | 2.9e-07 |
| 8 | GO:0004181 | metallocarboxypeptidase activity | 19 | 0 | 0.08 | 8 | 2.2e-07 | 2.2e-07 | 2.2e-07 |
| 9 | GO:0005515 | protein binding | 2101 | 10 | 8.32 | 17 | 0.00070 | 1.4e-06 | 0.00128 |
| 10 | GO:0005249 | voltage-gated potassium channel activity | 27 | 0 | 0.11 | 9 | 1.7e-06 | 1.7e-06 | 1.7e-06 |
| 11 | GO:0004888 | transmembrane signaling receptor activit… | 251 | 1 | 0.99 | 11 | 6.3e-06 | 6.1e-06 | 3.6e-30 |
| 12 | GO:0008417 | fucosyltransferase activity | 33 | 0 | 0.13 | 12 | 8.8e-06 | 8.8e-06 | 8.8e-06 |
| 13 | GO:0005230 | extracellular ligand-gated ion channel a… | 54 | 0 | 0.21 | 10 | 1.7e-06 | 1.0e-05 | 9.2e-08 |
| 14 | GO:0005200 | structural constituent of cytoskeleton | 17 | 2 | 0.07 | 13 | 0.00023 | 0.00023 | 0.00023 |
| 15 | GO:0004222 | metalloendopeptidase activity | 73 | 1 | 0.29 | 14 | 0.00031 | 0.00031 | 0.00031 |
| 16 | GO:0005328 | neurotransmitter:sodium symporter activi… | 18 | 0 | 0.07 | 15 | 0.00055 | 0.00055 | 0.00055 |
| 17 | GO:0051015 | actin filament binding | 30 | 0 | 0.12 | 16 | 0.00070 | 0.00070 | 0.00070 |
| 18 | GO:0003777 | microtubule motor activity | 98 | 0 | 0.39 | 20 | 0.00085 | 0.00085 | 0.00085 |
| 19 | GO:0030248 | cellulose binding | 7 | 1 | 0.03 | 21 | 0.00090 | 0.00090 | 0.00090 |
| 20 | GO:0020037 | heme binding | 95 | 1 | 0.38 | 23 | 0.00112 | 0.00112 | 0.00112 |
| 21 | GO:0016715 | oxidoreductase activity, acting on paire… | 14 | 1 | 0.06 | 24 | 0.00123 | 0.00123 | 0.00123 |
| 22 | GO:0004096 | catalase activity | 15 | 0 | 0.06 | 25 | 0.00156 | 0.00156 | 0.00156 |
| 23 | GO:0031409 | pigment binding | 10 | 1 | 0.04 | 28 | 0.00172 | 0.00172 | 0.00172 |
| 25 | GO:0004983 | neuropeptide Y receptor activity | 9 | 0 | 0.04 | 29 | 0.00188 | 0.00188 | 0.00188 |
| 26 | GO:0015293 | symporter activity | 32 | 0 | 0.13 | 34 | 0.00447 | 0.00238 | 1.6e-05 |
| 27 | GO:0008378 | galactosyltransferase activity | 23 | 0 | 0.09 | 30 | 0.00256 | 0.00256 | 0.00256 |
| 28 | GO:0004601 | peroxidase activity | 34 | 1 | 0.13 | 31 | 0.00300 | 0.00300 | 2.8e-05 |
| 29 | GO:0004190 | aspartic-type endopeptidase activity | 28 | 0 | 0.11 | 32 | 0.00306 | 0.00306 | 0.00306 |
| 31 | GO:0008528 | G protein-coupled peptide receptor activ… | 15 | 0 | 0.06 | 33 | 0.00428 | 0.00428 | 1.6e-05 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MF.reg.weight01)[sigTerms]),
caption = 'MF terms overrepresented in predictable environments')
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0005509 | calcium ion binding | Interacting selectively and non-covalently with calcium ions (Ca2+). | 0.0000000 |
| GO:0004930 | G protein-coupled receptor activity | Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. | 0.0000000 |
| GO:0043565 | sequence-specific DNA binding | Interacting selectively and non-covalently with DNA of a specific nucleotide composition, e.g. GC-rich DNA binding, or with a specific sequence motif or type of DNA e.g. promotor binding or rDNA binding. | 0.0000000 |
| GO:0008061 | chitin binding | Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. | 0.0000000 |
| GO:0005272 | sodium channel activity | Enables the facilitated diffusion of a sodium ion (by an energy-independent process) involving passage through a transmembrane aqueous pore or channel without evidence for a carrier-mediated mechanism. | 0.0000000 |
| GO:0004252 | serine-type endopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). | 0.0000000 |
| GO:0003774 | motor activity | Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. | 0.0000001 |
| GO:0004181 | metallocarboxypeptidase activity | Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0000002 |
| GO:0005515 | protein binding | Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). | 0.0000014 |
| GO:0005249 | voltage-gated potassium channel activity | Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. | 0.0000017 |
| GO:0004888 | transmembrane signaling receptor activity | Combining with an extracellular or intracellular signal and transmitting the signal from one side of the membrane to the other to initiate a change in cell activity or state as part of signal transduction. | 0.0000061 |
| GO:0008417 | fucosyltransferase activity | Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. | 0.0000088 |
| GO:0005230 | extracellular ligand-gated ion channel activity | Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. | 0.0000100 |
| GO:0005200 | structural constituent of cytoskeleton | The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. | 0.0002305 |
| GO:0004222 | metalloendopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0003142 |
| GO:0005328 | neurotransmitter:sodium symporter activity | Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: neurotransmitter(out) + Na+(out) = neurotransmitter(in) + Na+(in). | 0.0005460 |
| GO:0051015 | actin filament binding | Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. | 0.0006982 |
| GO:0003777 | microtubule motor activity | Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). | 0.0008454 |
| GO:0030248 | cellulose binding | Interacting selectively and non-covalently with cellulose. | 0.0009042 |
| GO:0020037 | heme binding | Interacting selectively and non-covalently with heme, any compound of iron complexed in a porphyrin (tetrapyrrole) ring. | 0.0011192 |
| GO:0016715 | oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen | Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. | 0.0012308 |
| GO:0004096 | catalase activity | Catalysis of the reaction: 2 hydrogen peroxide = O2 + 2 H2O. | 0.0015600 |
| GO:0031409 | pigment binding | Interacting selectively and non-covalently with a pigment, any general or particular coloring matter in living organisms, e.g. melanin. | 0.0017172 |
| GO:0004983 | neuropeptide Y receptor activity | Combining with neuropeptide Y to initiate a change in cell activity. | 0.0018782 |
| GO:0015293 | symporter activity | Enables the active transport of a solute across a membrane by a mechanism whereby two or more species are transported together in the same direction in a tightly coupled process not directly linked to a form of energy other than chemiosmotic energy. | 0.0023817 |
| GO:0008378 | galactosyltransferase activity | Catalysis of the transfer of a galactosyl group to an acceptor molecule, typically another carbohydrate or a lipid. | 0.0025622 |
| GO:0004601 | peroxidase activity | Catalysis of the reaction: donor + hydrogen peroxide = oxidized donor + 2 H2O. | 0.0030028 |
| GO:0004190 | aspartic-type endopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which a water molecule bound by the side chains of aspartic residues at the active center acts as a nucleophile. | 0.0030648 |
| GO:0008528 | G protein-coupled peptide receptor activity | Combining with a peptide and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. | 0.0042829 |
showSigOfNodes(GOdata.MF, score(MF.reg.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 116
## Number of Edges = 151
##
## $complete.dag
## [1] "A graph with 116 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CC.reg.weight01)))
significant.weight01 <- score(CC.reg.weight01)[orderedTerms] <= 0.005
significant.lea <- score(CC.reg.lea)[orderedTerms] <= 0.005
significant.elim <- score(CC.reg.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.reg.sigTerms <- sigTerms
CC.all <- GenTable(GOdata.CC, elim=CC.reg.elim, weight01=CC.reg.weight01, lea=CC.reg.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
CC.all[CC.all$GO.ID %in% sigTerms,],
caption = "Most overrepresented Cellular Component terms in predictable environments")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0016021 | integral component of membrane | 885 | 1 | 2.60 | 1 | < 1e-30 | < 1e-30 | < 1e-30 |
| GO:0016020 | membrane | 1565 | 3 | 4.60 | 3 | 6.9e-12 | 1.5e-19 | < 1e-30 |
| GO:0005576 | extracellular region | 140 | 4 | 0.41 | 2 | 4.1e-15 | 7.5e-14 | 1.2e-17 |
| GO:0016459 | myosin complex | 33 | 0 | 0.10 | 4 | 5.2e-08 | 5.2e-08 | 5.2e-08 |
| GO:0008076 | voltage-gated potassium channel complex | 13 | 0 | 0.04 | 5 | 3.2e-06 | 3.2e-06 | 3.2e-06 |
| GO:0005886 | plasma membrane | 170 | 0 | 0.50 | 6 | 0.00019 | 0.00016 | 2.1e-12 |
| GO:0098797 | plasma membrane protein complex | 34 | 0 | 0.10 | 8 | 0.00033 | 0.00049 | 3.7e-09 |
| GO:0031012 | extracellular matrix | 9 | 0 | 0.03 | 9 | 0.00081 | 0.00081 | 0.00081 |
| GO:0034704 | calcium channel complex | 5 | 0 | 0.01 | 10 | 0.00185 | 0.00185 | 0.00185 |
| GO:0005874 | microtubule | 29 | 2 | 0.09 | 11 | 0.00221 | 0.00221 | 0.00221 |
| GO:0005929 | cilium | 27 | 0 | 0.08 | 12 | 0.00279 | 0.00279 | 9.8e-08 |
| GO:0042555 | MCM complex | 10 | 3 | 0.03 | 14 | 0.00302 | 0.00302 | 0.00302 |
| GO:0005858 | axonemal dynein complex | 5 | 0 | 0.01 | 15 | 0.00371 | 0.00371 | 0.00371 |
| GO:0045211 | postsynaptic membrane | 16 | 0 | 0.05 | 16 | 0.00428 | 0.00428 | 0.00428 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CC.reg.weight01)[sigTerms]),
caption = 'CC terms overrepresented in predictable environments')
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0016021 | integral component of membrane | The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0000000 |
| GO:0016020 | membrane | A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. | 0.0000000 |
| GO:0005576 | extracellular region | The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. | 0.0000000 |
| GO:0016459 | myosin complex | A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. | 0.0000001 |
| GO:0008076 | voltage-gated potassium channel complex | A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. | 0.0000032 |
| GO:0005886 | plasma membrane | The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. | 0.0001593 |
| GO:0098797 | plasma membrane protein complex | Any protein complex that is part of the plasma membrane. | 0.0004918 |
| GO:0031012 | extracellular matrix | A structure lying external to one or more cells, which provides structural support, biochemical or biomechanical cues for cells or tissues. | 0.0008138 |
| GO:0034704 | calcium channel complex | An ion channel complex through which calcium ions pass. | 0.0018495 |
| GO:0005874 | microtubule | Any of the long, generally straight, hollow tubes of internal diameter 12-15 nm and external diameter 24 nm found in a wide variety of eukaryotic cells; each consists (usually) of 13 protofilaments of polymeric tubulin, staggered in such a manner that the tubulin monomers are arranged in a helical pattern on the microtubular surface, and with the alpha/beta axes of the tubulin subunits parallel to the long axis of the tubule; exist in equilibrium with pool of tubulin monomers and can be rapidly assembled or disassembled in response to physiological stimuli; concerned with force generation, e.g. in the spindle. | 0.0022146 |
| GO:0005929 | cilium | A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. | 0.0027889 |
| GO:0042555 | MCM complex | A hexameric protein complex required for the initiation and regulation of DNA replication. | 0.0030211 |
| GO:0005858 | axonemal dynein complex | A dynein complex found in eukaryotic cilia and flagella; the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. | 0.0037087 |
| GO:0045211 | postsynaptic membrane | A specialized area of membrane facing the presynaptic membrane on the tip of the nerve ending and separated from it by a minute cleft (the synaptic cleft). Neurotransmitters cross the synaptic cleft and transmit the signal to the postsynaptic membrane. | 0.0042845 |
showSigOfNodes(GOdata.CC, score(CC.reg.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 68
## Number of Edges = 117
##
## $complete.dag
## [1] "A graph with 68 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(BP.ran.weight01)))
significant.weight01 <- score(BP.ran.weight01)[orderedTerms] <= 0.005
significant.lea <- score(BP.ran.lea)[orderedTerms] <= 0.005
significant.elim <- score(BP.ran.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.ran.sigTerms <- sigTerms
BP.all <- GenTable(GOminus.BP, elim=BP.ran.elim, weight01=BP.ran.weight01, lea=BP.ran.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BP.all[BP.all$GO.ID %in% sigTerms,],
caption = "Most overrepresented Biological Process terms in random environments.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0006412 | translation | 195 | 0 | 0.98 | 1 | 1.5e-15 | 3.8e-21 | 4.5e-18 |
| 2 | GO:0006511 | ubiquitin-dependent protein catabolic pr… | 102 | 4 | 0.51 | 2 | 4.5e-10 | 8.6e-14 | 4.5e-10 |
| 3 | GO:0016579 | protein deubiquitination | 40 | 3 | 0.20 | 4 | 2.1e-09 | 2.1e-09 | 2.1e-09 |
| 4 | GO:0000398 | mRNA splicing, via spliceosome | 46 | 0 | 0.23 | 5 | 4.1e-07 | 7.4e-08 | 1.8e-10 |
| 5 | GO:0006397 | mRNA processing | 87 | 1 | 0.44 | 3 | 1.3e-09 | 1.4e-07 | 1.3e-17 |
| 6 | GO:0006289 | nucleotide-excision repair | 13 | 0 | 0.07 | 6 | 4.2e-07 | 4.2e-07 | 4.2e-07 |
| 7 | GO:0006351 | transcription, DNA-templated | 442 | 1 | 2.22 | 26 | 0.00120 | 5.0e-07 | 1.9e-09 |
| 8 | GO:0016567 | protein ubiquitination | 46 | 2 | 0.23 | 8 | 4.7e-06 | 4.7e-06 | 4.7e-06 |
| 9 | GO:0006396 | RNA processing | 225 | 2 | 1.13 | 7 | 8.6e-07 | 7.8e-06 | 1.3e-29 |
| 10 | GO:0006886 | intracellular protein transport | 141 | 2 | 0.71 | 9 | 9.6e-06 | 8.3e-06 | 2.6e-09 |
| 11 | GO:0006914 | autophagy | 21 | 0 | 0.11 | 11 | 1.5e-05 | 1.5e-05 | 2.9e-07 |
| 12 | GO:0006367 | transcription initiation from RNA polyme… | 13 | 0 | 0.07 | 12 | 1.8e-05 | 1.8e-05 | 1.8e-05 |
| 13 | GO:0006352 | DNA-templated transcription, initiation | 42 | 0 | 0.21 | 14 | 4.0e-05 | 4.0e-05 | 2.9e-07 |
| 15 | GO:0030163 | protein catabolic process | 123 | 4 | 0.62 | 15 | 0.00011 | 5.9e-05 | 4.7e-13 |
| 16 | GO:0006383 | transcription by RNA polymerase III | 8 | 0 | 0.04 | 17 | 0.00015 | 0.00015 | 0.00015 |
| 17 | GO:0006413 | translational initiation | 18 | 0 | 0.09 | 20 | 0.00026 | 0.00026 | 0.00026 |
| 18 | GO:0043628 | ncRNA 3’-end processing | 6 | 0 | 0.03 | 21 | 0.00031 | 0.00031 | 0.00031 |
| 19 | GO:0000184 | nuclear-transcribed mRNA catabolic proce… | 6 | 0 | 0.03 | 22 | 0.00046 | 0.00046 | 0.00046 |
| 20 | GO:0000245 | spliceosomal complex assembly | 7 | 0 | 0.04 | 23 | 0.00053 | 0.00053 | 0.00053 |
| 21 | GO:0016570 | histone modification | 30 | 0 | 0.15 | 16 | 0.00012 | 0.00076 | 6.9e-07 |
| 22 | GO:0006281 | DNA repair | 125 | 0 | 0.63 | 10 | 1.0e-05 | 0.00094 | 2.9e-07 |
| 23 | GO:0006368 | transcription elongation from RNA polyme… | 8 | 0 | 0.04 | 28 | 0.00141 | 0.00141 | 0.00141 |
| 24 | GO:0002098 | tRNA wobble uridine modification | 10 | 1 | 0.05 | 29 | 0.00152 | 0.00152 | 0.00152 |
| 25 | GO:0006744 | ubiquinone biosynthetic process | 5 | 0 | 0.03 | 31 | 0.00155 | 0.00155 | 0.00155 |
| 26 | GO:0016573 | histone acetylation | 12 | 0 | 0.06 | 32 | 0.00157 | 0.00157 | 0.00157 |
| 27 | GO:0016236 | macroautophagy | 6 | 0 | 0.03 | 33 | 0.00170 | 0.00170 | 0.00170 |
| 28 | GO:0043631 | RNA polyadenylation | 10 | 0 | 0.05 | 35 | 0.00182 | 0.00182 | 0.00182 |
| 29 | GO:0035304 | regulation of protein dephosphorylation | 6 | 0 | 0.03 | 37 | 0.00213 | 0.00213 | 0.00213 |
| 30 | GO:0045944 | positive regulation of transcription by … | 6 | 0 | 0.03 | 39 | 0.00223 | 0.00223 | 0.00223 |
| 32 | GO:0008033 | tRNA processing | 50 | 1 | 0.25 | 46 | 0.00317 | 0.00317 | 3.9e-07 |
| 33 | GO:0001510 | RNA methylation | 17 | 0 | 0.09 | 19 | 0.00016 | 0.00350 | 0.00016 |
| 34 | GO:0071985 | multivesicular body sorting pathway | 6 | 0 | 0.03 | 49 | 0.00414 | 0.00414 | 0.00414 |
| 35 | GO:0006888 | ER to Golgi vesicle-mediated transport | 16 | 0 | 0.08 | 52 | 0.00449 | 0.00449 | 0.00449 |
| 37 | GO:0043094 | cellular metabolic compound salvage | 7 | 1 | 0.04 | 53 | 0.00471 | 0.00471 | 0.00471 |
| 38 | GO:0048015 | phosphatidylinositol-mediated signaling | 9 | 0 | 0.05 | 56 | 0.00479 | 0.00479 | 0.00479 |
| 39 | GO:0009894 | regulation of catabolic process | 5 | 0 | 0.03 | 57 | 0.00484 | 0.00484 | 0.00484 |
| 40 | GO:0016226 | iron-sulfur cluster assembly | 8 | 0 | 0.04 | 58 | 0.00489 | 0.00489 | 0.00489 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term = Term(GOTERM[sigTerms]),
Definition = Definition(GOTERM[sigTerms]),
PValue=score(BP.ran.weight01)[sigTerms]),
caption = 'BP terms overrepresented in random environments')
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0006412 | translation | The cellular metabolic process in which a protein is formed, using the sequence of a mature mRNA or circRNA molecule to specify the sequence of amino acids in a polypeptide chain. Translation is mediated by the ribosome, and begins with the formation of a ternary complex between aminoacylated initiator methionine tRNA, GTP, and initiation factor 2, which subsequently associates with the small subunit of the ribosome and an mRNA or circRNA. Translation ends with the release of a polypeptide chain from the ribosome. | 0.0000000 |
| GO:0006511 | ubiquitin-dependent protein catabolic process | The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of a ubiquitin group, or multiple ubiquitin groups, to the protein. | 0.0000000 |
| GO:0016579 | protein deubiquitination | The removal of one or more ubiquitin groups from a protein. | 0.0000000 |
| GO:0000398 | mRNA splicing, via spliceosome | The joining together of exons from one or more primary transcripts of messenger RNA (mRNA) and the excision of intron sequences, via a spliceosomal mechanism, so that mRNA consisting only of the joined exons is produced. | 0.0000001 |
| GO:0006397 | mRNA processing | Any process involved in the conversion of a primary mRNA transcript into one or more mature mRNA(s) prior to translation into polypeptide. | 0.0000001 |
| GO:0006289 | nucleotide-excision repair | A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). | 0.0000004 |
| GO:0006351 | transcription, DNA-templated | The cellular synthesis of RNA on a template of DNA. | 0.0000005 |
| GO:0016567 | protein ubiquitination | The process in which one or more ubiquitin groups are added to a protein. | 0.0000047 |
| GO:0006396 | RNA processing | Any process involved in the conversion of one or more primary RNA transcripts into one or more mature RNA molecules. | 0.0000078 |
| GO:0006886 | intracellular protein transport | The directed movement of proteins in a cell, including the movement of proteins between specific compartments or structures within a cell, such as organelles of a eukaryotic cell. | 0.0000083 |
| GO:0006914 | autophagy | The cellular catabolic process in which cells digest parts of their own cytoplasm; allows for both recycling of macromolecular constituents under conditions of cellular stress and remodeling the intracellular structure for cell differentiation. | 0.0000150 |
| GO:0006367 | transcription initiation from RNA polymerase II promoter | Any process involved in the assembly of the RNA polymerase II preinitiation complex (PIC) at an RNA polymerase II promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. Promoter clearance, or release, is the transition between the initiation and elongation phases of transcription. | 0.0000178 |
| GO:0006352 | DNA-templated transcription, initiation | Any process involved in the assembly of the RNA polymerase preinitiation complex (PIC) at the core promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. The initiation phase ends just before and does not include promoter clearance, or release, which is the transition between the initiation and elongation phases of transcription. | 0.0000402 |
| GO:0030163 | protein catabolic process | The chemical reactions and pathways resulting in the breakdown of a protein by the destruction of the native, active configuration, with or without the hydrolysis of peptide bonds. | 0.0000588 |
| GO:0006383 | transcription by RNA polymerase III | The synthesis of RNA from a DNA template by RNA polymerase III, originating at an RNAP III promoter. | 0.0001480 |
| GO:0006413 | translational initiation | The process preceding formation of the peptide bond between the first two amino acids of a protein. This includes the formation of a complex of the ribosome, mRNA or circRNA, and an initiation complex that contains the first aminoacyl-tRNA. | 0.0002606 |
| GO:0043628 | ncRNA 3’-end processing | Any process involved in forming the mature 3’ end of a non-coding RNA molecule. | 0.0003081 |
| GO:0000184 | nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | The nonsense-mediated decay pathway for nuclear-transcribed mRNAs degrades mRNAs in which an amino-acid codon has changed to a nonsense codon; this prevents the translation of such mRNAs into truncated, and potentially harmful, proteins. | 0.0004563 |
| GO:0000245 | spliceosomal complex assembly | The aggregation, arrangement and bonding together of a spliceosomal complex, a ribonucleoprotein apparatus that catalyzes nuclear mRNA splicing via transesterification reactions. | 0.0005274 |
| GO:0016570 | histone modification | The covalent alteration of one or more amino acid residues within a histone protein. | 0.0007554 |
| GO:0006281 | DNA repair | The process of restoring DNA after damage. Genomes are subject to damage by chemical and physical agents in the environment (e.g. UV and ionizing radiations, chemical mutagens, fungal and bacterial toxins, etc.) and by free radicals or alkylating agents endogenously generated in metabolism. DNA is also damaged because of errors during its replication. A variety of different DNA repair pathways have been reported that include direct reversal, base excision repair, nucleotide excision repair, photoreactivation, bypass, double-strand break repair pathway, and mismatch repair pathway. | 0.0009435 |
| GO:0006368 | transcription elongation from RNA polymerase II promoter | The extension of an RNA molecule after transcription initiation and promoter clearance at an RNA polymerase II promoter by the addition of ribonucleotides catalyzed by RNA polymerase II. | 0.0014137 |
| GO:0002098 | tRNA wobble uridine modification | The process in which a uridine in position 34 of a tRNA is post-transcriptionally modified. | 0.0015167 |
| GO:0006744 | ubiquinone biosynthetic process | The chemical reactions and pathways resulting in the formation of ubiquinone, a lipid-soluble electron-transporting coenzyme. | 0.0015472 |
| GO:0016573 | histone acetylation | The modification of a histone by the addition of an acetyl group. | 0.0015664 |
| GO:0016236 | macroautophagy | The major inducible pathway for the general turnover of cytoplasmic constituents in eukaryotic cells, it is also responsible for the degradation of active cytoplasmic enzymes and organelles during nutrient starvation. Macroautophagy involves the formation of double-membrane-bounded autophagosomes which enclose the cytoplasmic constituent targeted for degradation in a membrane-bounded structure. Autophagosomes then fuse with a lysosome (or vacuole) releasing single-membrane-bounded autophagic bodies that are then degraded within the lysosome (or vacuole). Some types of macroautophagy, e.g. pexophagy, mitophagy, involve selective targeting of the targets to be degraded. | 0.0016976 |
| GO:0043631 | RNA polyadenylation | The enzymatic addition of a sequence of adenylyl residues at the 3’ end of an RNA molecule. | 0.0018202 |
| GO:0035304 | regulation of protein dephosphorylation | Any process that modulates the frequency, rate or extent of removal of phosphate groups from a protein. | 0.0021291 |
| GO:0045944 | positive regulation of transcription by RNA polymerase II | Any process that activates or increases the frequency, rate or extent of transcription from an RNA polymerase II promoter. | 0.0022302 |
| GO:0008033 | tRNA processing | The process in which a pre-tRNA molecule is converted to a mature tRNA, ready for addition of an aminoacyl group. | 0.0031681 |
| GO:0001510 | RNA methylation | Posttranscriptional addition of a methyl group to either a nucleotide or 2’-O ribose in a polyribonucleotide. Usually uses S-adenosylmethionine as a cofactor. | 0.0035013 |
| GO:0071985 | multivesicular body sorting pathway | A vesicle-mediated transport process in which transmembrane proteins are ubiquitylated to facilitate their entry into luminal vesicles of multivesicular bodies (MVBs); upon subsequent fusion of MVBs with lysosomes or vacuoles, the cargo proteins are degraded. | 0.0041427 |
| GO:0006888 | ER to Golgi vesicle-mediated transport | The directed movement of substances from the endoplasmic reticulum (ER) to the Golgi, mediated by COP II vesicles. Small COP II coated vesicles form from the ER and then fuse directly with the cis-Golgi. Larger structures are transported along microtubules to the cis-Golgi. | 0.0044884 |
| GO:0043094 | cellular metabolic compound salvage | Any process which produces a useful metabolic compound from derivatives of it without de novo synthesis, as carried out by individual cells. | 0.0047084 |
| GO:0048015 | phosphatidylinositol-mediated signaling | A series of molecular signals in which a cell uses a phosphatidylinositol-mediated signaling to convert a signal into a response. Phosphatidylinositols include phosphatidylinositol (PtdIns) and its phosphorylated derivatives. | 0.0047888 |
| GO:0009894 | regulation of catabolic process | Any process that modulates the frequency, rate, or extent of the chemical reactions and pathways resulting in the breakdown of substances. | 0.0048416 |
| GO:0016226 | iron-sulfur cluster assembly | The incorporation of iron and exogenous sulfur into a metallo-sulfur cluster. | 0.0048899 |
I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.
showSigOfNodes(GOminus.BP, score(BP.ran.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 279
## Number of Edges = 535
##
## $complete.dag
## [1] "A graph with 279 nodes."
This is just a example of the most significant GO term:
showGroupDensity(GOminus.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOminus.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOminus.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MF.ran.weight01)))
significant.weight01 <- score(MF.ran.weight01)[orderedTerms] <= 0.005
significant.lea <- score(MF.ran.lea)[orderedTerms] <= 0.005
significant.elim <- score(MF.ran.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.ran.sigTerms <- sigTerms
MF.all <- GenTable(GOminus.MF, elim=MF.ran.elim, weight01=MF.ran.weight01, lea=MF.ran.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MF.all[MF.all$GO.ID %in% sigTerms,],
caption = "Most significant terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0003735 | structural constituent of ribosome | 113 | 0 | 0.57 | 1 | 5.2e-23 | 5.2e-23 | 5.2e-23 |
| 2 | GO:0003723 | RNA binding | 250 | 1 | 1.26 | 2 | 3.1e-12 | 4.2e-12 | 6.8e-18 |
| 3 | GO:0036459 | thiol-dependent ubiquitinyl hydrolase ac… | 43 | 3 | 0.22 | 3 | 6.5e-08 | 6.5e-08 | 1.1e-10 |
| 4 | GO:0003676 | nucleic acid binding | 1217 | 4 | 6.14 | 7 | 1.9e-06 | 7.2e-08 | 9.7e-07 |
| 5 | GO:0061630 | ubiquitin protein ligase activity | 27 | 1 | 0.14 | 4 | 2.1e-07 | 2.1e-07 | 2.1e-07 |
| 6 | GO:0003712 | transcription coregulator activity | 42 | 0 | 0.21 | 6 | 7.5e-07 | 2.4e-07 | 7.5e-07 |
| 7 | GO:0004842 | ubiquitin-protein transferase activity | 76 | 2 | 0.38 | 5 | 2.6e-07 | 2.6e-07 | 3.9e-11 |
| 8 | GO:0003899 | DNA-directed 5’-3’ RNA polymerase activi… | 27 | 0 | 0.14 | 8 | 2.6e-06 | 2.6e-06 | 2.6e-06 |
| 9 | GO:0004298 | threonine-type endopeptidase activity | 15 | 0 | 0.08 | 9 | 1.2e-05 | 1.2e-05 | 1.2e-05 |
| 12 | GO:0003743 | translation initiation factor activity | 28 | 0 | 0.14 | 10 | 5.4e-05 | 5.4e-05 | 5.4e-05 |
| 13 | GO:0046982 | protein heterodimerization activity | 66 | 0 | 0.33 | 11 | 7.3e-05 | 7.3e-05 | 7.3e-05 |
| 14 | GO:0031625 | ubiquitin protein ligase binding | 10 | 0 | 0.05 | 12 | 0.00038 | 0.00038 | 0.00038 |
| 15 | GO:0004843 | thiol-dependent ubiquitin-specific prote… | 12 | 1 | 0.06 | 13 | 0.00038 | 0.00038 | 0.00038 |
| 16 | GO:0004674 | protein serine/threonine kinase activity | 76 | 0 | 0.38 | 20 | 0.00184 | 0.00069 | 0.00184 |
| 17 | GO:0003729 | mRNA binding | 30 | 1 | 0.15 | 16 | 0.00097 | 0.00097 | 0.00097 |
| 18 | GO:0008168 | methyltransferase activity | 97 | 0 | 0.49 | 15 | 0.00081 | 0.00106 | 1.1e-06 |
| 19 | GO:0000166 | nucleotide binding | 1117 | 5 | 5.64 | 17 | 0.00099 | 0.00118 | 1.3e-05 |
| 20 | GO:0004402 | histone acetyltransferase activity | 9 | 0 | 0.05 | 18 | 0.00123 | 0.00123 | 0.00123 |
| 22 | GO:0008137 | NADH dehydrogenase (ubiquinone) activity | 12 | 0 | 0.06 | 22 | 0.00204 | 0.00204 | 0.00204 |
| 23 | GO:0051536 | iron-sulfur cluster binding | 35 | 0 | 0.18 | 21 | 0.00190 | 0.00211 | 0.00190 |
| 27 | GO:0005524 | ATP binding | 786 | 2 | 3.97 | 24 | 0.00280 | 0.00280 | 0.00280 |
| 28 | GO:0008536 | Ran GTPase binding | 14 | 0 | 0.07 | 25 | 0.00322 | 0.00322 | 0.00322 |
| 29 | GO:0003684 | damaged DNA binding | 10 | 0 | 0.05 | 26 | 0.00330 | 0.00330 | 0.00330 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MF.ran.weight01)[sigTerms]),
caption = 'MF terms overrepresented in random environments')
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0003735 | structural constituent of ribosome | The action of a molecule that contributes to the structural integrity of the ribosome. | 0.0000000 |
| GO:0003723 | RNA binding | Interacting selectively and non-covalently with an RNA molecule or a portion thereof. | 0.0000000 |
| GO:0036459 | thiol-dependent ubiquitinyl hydrolase activity | Catalysis of the thiol-dependent hydrolysis of an ester, thioester, amide, peptide or isopeptide bond formed by the C-terminal glycine of ubiquitin. | 0.0000001 |
| GO:0003676 | nucleic acid binding | Interacting selectively and non-covalently with any nucleic acid. | 0.0000001 |
| GO:0061630 | ubiquitin protein ligase activity | Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. | 0.0000002 |
| GO:0003712 | transcription coregulator activity | A protein or a member of a complex that interacts specifically and non-covalently with a DNA-bound DNA-binding transcription factor to either activate or repress the transcription of specific genes. Coregulators often act by altering chromatin structure and modifications. For example, one class of transcription coregulators modifies chromatin structure through covalent modification of histones. A second ATP-dependent class modifies the conformation of chromatin. A third class modulates interactions of DNA-binding transcription factor with other transcription coregulators. | 0.0000002 |
| GO:0004842 | ubiquitin-protein transferase activity | Catalysis of the transfer of ubiquitin from one protein to another via the reaction X-Ub + Y –> Y-Ub + X, where both X-Ub and Y-Ub are covalent linkages. | 0.0000003 |
| GO:0003899 | DNA-directed 5’-3’ RNA polymerase activity | Catalysis of the reaction: nucleoside triphosphate + RNA(n) = diphosphate + RNA(n+1). Utilizes a DNA template, i.e. the catalysis of DNA-template-directed extension of the 3’-end of an RNA strand by one nucleotide at a time. Can initiate a chain ‘de novo’. | 0.0000026 |
| GO:0004298 | threonine-type endopeptidase activity | Catalysis of the hydrolysis of internal peptide bonds in a polypeptide chain by a mechanism in which the hydroxyl group of a threonine residue at the active center acts as a nucleophile. | 0.0000119 |
| GO:0003743 | translation initiation factor activity | Functions in the initiation of ribosome-mediated translation of mRNA into a polypeptide. | 0.0000541 |
| GO:0046982 | protein heterodimerization activity | Interacting selectively and non-covalently with a nonidentical protein to form a heterodimer. | 0.0000729 |
| GO:0031625 | ubiquitin protein ligase binding | Interacting selectively and non-covalently with a ubiquitin protein ligase enzyme, any of the E3 proteins. | 0.0003755 |
| GO:0004843 | thiol-dependent ubiquitin-specific protease activity | Catalysis of the thiol-dependent hydrolysis of a peptide bond formed by the C-terminal glycine of ubiquitin and another protein. | 0.0003848 |
| GO:0004674 | protein serine/threonine kinase activity | Catalysis of the reactions: ATP + protein serine = ADP + protein serine phosphate, and ATP + protein threonine = ADP + protein threonine phosphate. | 0.0006932 |
| GO:0003729 | mRNA binding | Interacting selectively and non-covalently with messenger RNA (mRNA), an intermediate molecule between DNA and protein. mRNA includes UTR and coding sequences, but does not contain introns. | 0.0009711 |
| GO:0008168 | methyltransferase activity | Catalysis of the transfer of a methyl group to an acceptor molecule. | 0.0010650 |
| GO:0000166 | nucleotide binding | Interacting selectively and non-covalently with a nucleotide, any compound consisting of a nucleoside that is esterified with (ortho)phosphate or an oligophosphate at any hydroxyl group on the ribose or deoxyribose. | 0.0011756 |
| GO:0004402 | histone acetyltransferase activity | Catalysis of the reaction: acetyl-CoA + histone = CoA + acetyl-histone. | 0.0012320 |
| GO:0008137 | NADH dehydrogenase (ubiquinone) activity | Catalysis of the reaction: NADH + H+ + ubiquinone = NAD+ + ubiquinol. | 0.0020413 |
| GO:0051536 | iron-sulfur cluster binding | Interacting selectively and non-covalently with an iron-sulfur cluster, a combination of iron and sulfur atoms. | 0.0021125 |
| GO:0005524 | ATP binding | Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. | 0.0027954 |
| GO:0008536 | Ran GTPase binding | Interacting selectively and non-covalently with Ran, a conserved Ras-like GTP-binding protein, implicated in nucleocytoplasmic transport, cell cycle progression, spindle assembly, nuclear organization and nuclear envelope (NE) assembly. | 0.0032245 |
| GO:0003684 | damaged DNA binding | Interacting selectively and non-covalently with damaged DNA. | 0.0032998 |
| GO:0003887 | DNA-directed DNA polymerase activity | Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template and a 3’hydroxyl group. | 0.0048384 |
| GO:0060590 | ATPase regulator activity | Modulates the rate of ATP hydrolysis by an ATPase. | 0.0048503 |
| GO:0043022 | ribosome binding | Interacting selectively and non-covalently with any part of a ribosome. | 0.0048964 |
showSigOfNodes(GOminus.MF, score(MF.ran.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 105
## Number of Edges = 126
##
## $complete.dag
## [1] "A graph with 105 nodes."
showGroupDensity(GOminus.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOminus.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOminus.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CC.ran.weight01)))
significant.weight01 <- score(CC.ran.weight01)[orderedTerms] <= 0.005
significant.lea <- score(CC.ran.lea)[orderedTerms] <= 0.005
significant.elim <- score(CC.ran.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.ran.sigTerms <- sigTerms
CC.all <- GenTable(GOminus.CC, elim=CC.ran.elim, weight01=CC.ran.weight01, lea=CC.ran.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
CC.all[CC.all$GO.ID %in% sigTerms,],
caption = "Most overrepresented Cellular Component terms in random environments.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0005840 | ribosome | 115 | 0 | 0.59 | 1 | 1.6e-19 | 1.6e-18 | 6.4e-22 |
| 2 | GO:0005634 | nucleus | 531 | 2 | 2.71 | 2 | 6.7e-11 | 6.7e-11 | < 1e-30 |
| 3 | GO:0016592 | mediator complex | 24 | 0 | 0.12 | 3 | 1.3e-06 | 1.3e-06 | 1.3e-06 |
| 5 | GO:0005681 | spliceosomal complex | 12 | 0 | 0.06 | 4 | 1.1e-05 | 1.1e-05 | 1.1e-05 |
| 6 | GO:0090575 | RNA polymerase II transcription factor c… | 23 | 0 | 0.12 | 5 | 1.9e-05 | 1.9e-05 | 1.8e-07 |
| 7 | GO:0016591 | RNA polymerase II, holoenzyme | 20 | 0 | 0.10 | 6 | 2.2e-05 | 2.2e-05 | 2.2e-07 |
| 8 | GO:0008023 | transcription elongation factor complex | 10 | 0 | 0.05 | 10 | 0.00023 | 0.00023 | 0.00023 |
| 9 | GO:0044451 | nucleoplasm part | 96 | 0 | 0.49 | 21 | 0.00198 | 0.00033 | 3.0e-19 |
| 10 | GO:0044444 | cytoplasmic part | 511 | 3 | 2.61 | 7 | 2.8e-05 | 0.00067 | 5.3e-24 |
| 11 | GO:0070603 | SWI/SNF superfamily-type complex | 19 | 0 | 0.10 | 12 | 0.00043 | 0.00098 | 0.00043 |
| 12 | GO:0015934 | large ribosomal subunit | 14 | 0 | 0.07 | 15 | 0.00105 | 0.00105 | 0.00105 |
| 13 | GO:0005669 | transcription factor TFIID complex | 7 | 0 | 0.04 | 16 | 0.00107 | 0.00107 | 0.00107 |
| 14 | GO:0044455 | mitochondrial membrane part | 26 | 0 | 0.13 | 18 | 0.00123 | 0.00150 | 0.00123 |
| 15 | GO:0008180 | COP9 signalosome | 7 | 0 | 0.04 | 23 | 0.00221 | 0.00221 | 0.00221 |
| 16 | GO:0010008 | endosome membrane | 9 | 0 | 0.05 | 24 | 0.00234 | 0.00234 | 0.00234 |
| 17 | GO:0032991 | protein-containing complex | 886 | 1 | 4.52 | 30 | 0.00400 | 0.00305 | < 1e-30 |
| 18 | GO:0019773 | proteasome core complex, alpha-subunit c… | 7 | 0 | 0.04 | 31 | 0.00434 | 0.00434 | 0.00434 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CC.ran.weight01)[sigTerms]),
caption = 'CC terms overrepresented in random environments.')
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0005840 | ribosome | An intracellular organelle, about 200 A in diameter, consisting of RNA and protein. It is the site of protein biosynthesis resulting from translation of messenger RNA (mRNA). It consists of two subunits, one large and one small, each containing only protein and RNA. Both the ribosome and its subunits are characterized by their sedimentation coefficients, expressed in Svedberg units (symbol: S). Hence, the prokaryotic ribosome (70S) comprises a large (50S) subunit and a small (30S) subunit, while the eukaryotic ribosome (80S) comprises a large (60S) subunit and a small (40S) subunit. Two sites on the ribosomal large subunit are involved in translation, namely the aminoacyl site (A site) and peptidyl site (P site). Ribosomes from prokaryotes, eukaryotes, mitochondria, and chloroplasts have characteristically distinct ribosomal proteins. | 0.0000000 |
| GO:0005634 | nucleus | A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. | 0.0000000 |
| GO:0016592 | mediator complex | A protein complex that interacts with the carboxy-terminal domain of the largest subunit of RNA polymerase II and plays an active role in transducing the signal from a transcription factor to the transcriptional machinery. The mediator complex is required for activation of transcription of most protein-coding genes, but can also act as a transcriptional corepressor. The Saccharomyces complex contains several identifiable subcomplexes: a head domain comprising Srb2, -4, and -5, Med6, -8, and -11, and Rox3 proteins; a middle domain comprising Med1, -4, and -7, Nut1 and -2, Cse2, Rgr1, Soh1, and Srb7 proteins; a tail consisting of Gal11p, Med2p, Pgd1p, and Sin4p; and a regulatory subcomplex comprising Ssn2, -3, and -8, and Srb8 proteins. Metazoan mediator complexes have similar modular structures and include homologs of yeast Srb and Med proteins. | 0.0000013 |
| GO:0005681 | spliceosomal complex | Any of a series of ribonucleoprotein complexes that contain snRNA(s) and small nuclear ribonucleoproteins (snRNPs), and are formed sequentially during the spliceosomal splicing of one or more substrate RNAs, and which also contain the RNA substrate(s) from the initial target RNAs of splicing, the splicing intermediate RNA(s), to the final RNA products. During cis-splicing, the initial target RNA is a single, contiguous RNA transcript, whether mRNA, snoRNA, etc., and the released products are a spliced RNA and an excised intron, generally as a lariat structure. During trans-splicing, there are two initial substrate RNAs, the spliced leader RNA and a pre-mRNA. | 0.0000108 |
| GO:0090575 | RNA polymerase II transcription factor complex | A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. | 0.0000190 |
| GO:0016591 | RNA polymerase II, holoenzyme | A nuclear DNA-directed RNA polymerase complex containing an RNA polymerase II core enzyme as well as additional proteins and transcription factor complexes, that are capable of promoter recognition and transcription initiation from an RNA polymerase II promoter in vivo. These additional components may include general transcription factor complexes TFIIA, TFIID, TFIIE, TFIIF, or TFIIH, as well as Mediator, SWI/SNF, GCN5, or SRBs and confer the ability to recognize promoters. | 0.0000217 |
| GO:0008023 | transcription elongation factor complex | Any protein complex that interacts with RNA polymerase II to increase (positive transcription elongation factor) or reduce (negative transcription elongation factor) the rate of transcription elongation. | 0.0002321 |
| GO:0044451 | nucleoplasm part | Any constituent part of the nucleoplasm, that part of the nuclear content other than the chromosomes or the nucleolus. | 0.0003254 |
| GO:0044444 | cytoplasmic part | Any constituent part of the cytoplasm, all of the contents of a cell excluding the plasma membrane and nucleus, but including other subcellular structures. | 0.0006735 |
| GO:0070603 | SWI/SNF superfamily-type complex | A protein complex that contains an ortholog of the Saccharomyces ATPase Swi2/Snf2 as one of the catalytic subunit components (ATPase) and mediates assembly of nucleosomes, changes to the spacing or structure of nucleosomes, or some combination of those activities in a manner that requires ATP. | 0.0009792 |
| GO:0015934 | large ribosomal subunit | The larger of the two subunits of a ribosome. Two sites on the ribosomal large subunit are involved in translation, namely the aminoacyl site (A site) and peptidyl site (P site). | 0.0010546 |
| GO:0005669 | transcription factor TFIID complex | A complex composed of TATA binding protein (TBP) and TBP associated factors (TAFs); the total mass is typically about 800 kDa. Most of the TAFs are conserved across species. In TATA-containing promoters for RNA polymerase II (Pol II), TFIID is believed to recognize at least two distinct elements, the TATA element and a downstream promoter element. TFIID is also involved in recognition of TATA-less Pol II promoters. Binding of TFIID to DNA is necessary but not sufficient for transcription initiation from most RNA polymerase II promoters. | 0.0010651 |
| GO:0044455 | mitochondrial membrane part | Any constituent part of a mitochondrial membrane, either of the lipid bilayers that surround the mitochondrion and form the mitochondrial envelope. | 0.0015008 |
| GO:0008180 | COP9 signalosome | A protein complex that catalyzes the deneddylation of proteins, including the cullin component of SCF ubiquitin E3 ligase; deneddylation increases the activity of cullin family ubiquitin ligases. The signalosome is involved in many regulatory process, including some which control development, in many species; also regulates photomorphogenesis in plants; in many species its subunits are highly similar to those of the proteasome. | 0.0022149 |
| GO:0010008 | endosome membrane | The lipid bilayer surrounding an endosome. | 0.0023386 |
| GO:0032991 | protein-containing complex | A stable assembly of two or more macromolecules, i.e. proteins, nucleic acids, carbohydrates or lipids, in which at least one component is a protein and the constituent parts function together. | 0.0030549 |
| GO:0019773 | proteasome core complex, alpha-subunit complex | The proteasome core subcomplex that constitutes the two outer rings of the proteasome core complex. An example of this component is found in Mus musculus. | 0.0043397 |
showSigOfNodes(GOminus.CC, score(CC.ran.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 100
## Number of Edges = 191
##
## $complete.dag
## [1] "A graph with 100 nodes."
showGroupDensity(GOminus.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOminus.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOminus.CC, orderedTerms[3], rm.one=FALSE)
I want to compare these results with those from 2020-01-14, where I used the \(p\) value of the differential expression analysis to order the genes. I can import the data from 2020-01-14, but I need to do it in a new environment to prevent overwriting the current results.
load(ENRICHMENT, ex <- new.env())
allTerms <- usedGO(GOdata.BP)
BP.pvalue.sigTerms <- with(ex, BP.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat = allTerms %in% c(BP.reg.sigTerms, BP.ran.sigTerms),
PValue = allTerms %in% BP.pvalue.sigTerms)))
# New terms:
as.data.frame(Term(GOTERM[setdiff(c(BP.reg.sigTerms, BP.ran.sigTerms), BP.pvalue.sigTerms)]))
## Term(GOTERM[setdiff(c(BP.reg.sigTerms, BP.ran.sigTerms), BP.pvalue.sigTerms)])
## GO:0006814 sodium ion transport
## GO:0006030 chitin metabolic process
## GO:0006486 protein glycosylation
## GO:0005975 carbohydrate metabolic process
## GO:0015074 DNA integration
## GO:0006811 ion transport
## GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules
## GO:0070588 calcium ion transmembrane transport
## GO:0007154 cell communication
## GO:0007018 microtubule-based movement
## GO:0006412 translation
## GO:0006511 ubiquitin-dependent protein catabolic process
## GO:0016579 protein deubiquitination
## GO:0000398 mRNA splicing, via spliceosome
## GO:0006397 mRNA processing
## GO:0006351 transcription, DNA-templated
## GO:0016567 protein ubiquitination
## GO:0006396 RNA processing
## GO:0006886 intracellular protein transport
## GO:0006914 autophagy
## GO:0006367 transcription initiation from RNA polymerase II promoter
## GO:0006352 DNA-templated transcription, initiation
## GO:0030163 protein catabolic process
## GO:0006383 transcription by RNA polymerase III
## GO:0006413 translational initiation
## GO:0043628 ncRNA 3'-end processing
## GO:0000184 nuclear-transcribed mRNA catabolic process, nonsense-mediated decay
## GO:0000245 spliceosomal complex assembly
## GO:0016570 histone modification
## GO:0006281 DNA repair
## GO:0006368 transcription elongation from RNA polymerase II promoter
## GO:0002098 tRNA wobble uridine modification
## GO:0006744 ubiquinone biosynthetic process
## GO:0016573 histone acetylation
## GO:0016236 macroautophagy
## GO:0043631 RNA polyadenylation
## GO:0035304 regulation of protein dephosphorylation
## GO:0045944 positive regulation of transcription by RNA polymerase II
## GO:0008033 tRNA processing
## GO:0001510 RNA methylation
## GO:0071985 multivesicular body sorting pathway
## GO:0006888 ER to Golgi vesicle-mediated transport
## GO:0043094 cellular metabolic compound salvage
## GO:0048015 phosphatidylinositol-mediated signaling
## GO:0009894 regulation of catabolic process
## GO:0016226 iron-sulfur cluster assembly
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(BP.pvalue.sigTerms, c(BP.reg.sigTerms, BP.ran.sigTerms))]))
## Term(GOTERM[setdiff(BP.pvalue.sigTerms, c(BP.reg.sigTerms, BP.ran.sigTerms))])
## GO:0006468 protein phosphorylation
## GO:0055114 oxidation-reduction process
## GO:0007165 signal transduction
## GO:0006355 regulation of transcription, DNA-templated
## GO:0006470 protein dephosphorylation
## GO:0006979 response to oxidative stress
## GO:1901642 nucleoside transmembrane transport
## GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process
## GO:0046942 carboxylic acid transport
allTerms <- usedGO(GOdata.MF)
MF.pvalue.sigTerms <- with(ex, MF.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat = allTerms %in% c(MF.reg.sigTerms, MF.ran.sigTerms),
PValue = allTerms %in% MF.pvalue.sigTerms)))
# New terms:
as.data.frame(Term(GOTERM[setdiff(c(MF.reg.sigTerms, MF.ran.sigTerms), MF.pvalue.sigTerms)]))
## Term(GOTERM[setdiff(c(MF.reg.sigTerms, MF.ran.sigTerms), MF.pvalue.sigTerms)])
## GO:0043565 sequence-specific DNA binding
## GO:0008061 chitin binding
## GO:0005272 sodium channel activity
## GO:0004888 transmembrane signaling receptor activity
## GO:0008417 fucosyltransferase activity
## GO:0005230 extracellular ligand-gated ion channel activity
## GO:0005328 neurotransmitter:sodium symporter activity
## GO:0051015 actin filament binding
## GO:0003777 microtubule motor activity
## GO:0020037 heme binding
## GO:0004096 catalase activity
## GO:0031409 pigment binding
## GO:0004983 neuropeptide Y receptor activity
## GO:0015293 symporter activity
## GO:0008378 galactosyltransferase activity
## GO:0004601 peroxidase activity
## GO:0004190 aspartic-type endopeptidase activity
## GO:0008528 G protein-coupled peptide receptor activity
## GO:0003735 structural constituent of ribosome
## GO:0003723 RNA binding
## GO:0036459 thiol-dependent ubiquitinyl hydrolase activity
## GO:0003676 nucleic acid binding
## GO:0003712 transcription coregulator activity
## GO:0004842 ubiquitin-protein transferase activity
## GO:0003899 DNA-directed 5'-3' RNA polymerase activity
## GO:0004298 threonine-type endopeptidase activity
## GO:0003743 translation initiation factor activity
## GO:0046982 protein heterodimerization activity
## GO:0031625 ubiquitin protein ligase binding
## GO:0004843 thiol-dependent ubiquitin-specific protease activity
## GO:0004674 protein serine/threonine kinase activity
## GO:0003729 mRNA binding
## GO:0008168 methyltransferase activity
## GO:0000166 nucleotide binding
## GO:0004402 histone acetyltransferase activity
## GO:0008137 NADH dehydrogenase (ubiquinone) activity
## GO:0051536 iron-sulfur cluster binding
## GO:0008536 Ran GTPase binding
## GO:0003684 damaged DNA binding
## GO:0003887 DNA-directed DNA polymerase activity
## GO:0060590 ATPase regulator activity
## GO:0043022 ribosome binding
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(MF.pvalue.sigTerms, c(MF.reg.sigTerms, MF.ran.sigTerms))]))
## Term(GOTERM[setdiff(MF.pvalue.sigTerms, c(MF.reg.sigTerms, MF.ran.sigTerms))])
## GO:0005525 GTP binding
## GO:0004672 protein kinase activity
## GO:0005507 copper ion binding
## GO:0008138 protein tyrosine/serine/threonine phosphatase activity
## GO:0004198 calcium-dependent cysteine-type endopeptidase activity
## GO:0003924 GTPase activity
## GO:0042626 ATPase activity, coupled to transmembrane movement of substances
## GO:0005337 nucleoside transmembrane transporter activity
## GO:0016787 hydrolase activity
## GO:0008017 microtubule binding
## GO:0016840 carbon-nitrogen lyase activity
## GO:0047631 ADP-ribose diphosphatase activity
## GO:0005506 iron ion binding
allTerms <- usedGO(GOdata.CC)
CC.pvalue.sigTerms <- with(ex, CC.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat = allTerms %in% c(CC.reg.sigTerms, CC.ran.sigTerms),
PValue = allTerms %in% CC.pvalue.sigTerms)))
# New terms:
as.data.frame(Term(GOTERM[setdiff(c(CC.reg.sigTerms, CC.ran.sigTerms), CC.pvalue.sigTerms)]))
## Term(GOTERM[setdiff(c(CC.reg.sigTerms, CC.ran.sigTerms), CC.pvalue.sigTerms)])
## GO:0098797 plasma membrane protein complex
## GO:0031012 extracellular matrix
## GO:0034704 calcium channel complex
## GO:0005874 microtubule
## GO:0005929 cilium
## GO:0005858 axonemal dynein complex
## GO:0045211 postsynaptic membrane
## GO:0005840 ribosome
## GO:0005634 nucleus
## GO:0016592 mediator complex
## GO:0005681 spliceosomal complex
## GO:0016591 RNA polymerase II, holoenzyme
## GO:0008023 transcription elongation factor complex
## GO:0044451 nucleoplasm part
## GO:0044444 cytoplasmic part
## GO:0070603 SWI/SNF superfamily-type complex
## GO:0015934 large ribosomal subunit
## GO:0005669 transcription factor TFIID complex
## GO:0044455 mitochondrial membrane part
## GO:0008180 COP9 signalosome
## GO:0010008 endosome membrane
## GO:0032991 protein-containing complex
## GO:0019773 proteasome core complex, alpha-subunit complex
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(CC.pvalue.sigTerms, c(CC.reg.sigTerms, CC.ran.sigTerms))]))
## Term(GOTERM[setdiff(CC.pvalue.sigTerms, c(CC.reg.sigTerms, CC.ran.sigTerms))])
## GO:0005887 integral component of plasma membrane
## GO:0098800 inner mitochondrial membrane protein complex
save(allgenes2GO,
GOdata.BP, BP.reg.elim, BP.reg.weight01, BP.reg.lea, BP.reg.sigTerms,
GOminus.BP, BP.ran.elim, BP.ran.weight01, BP.ran.lea, BP.ran.sigTerms,
GOdata.MF, MF.reg.elim, MF.reg.weight01, MF.reg.lea, MF.reg.sigTerms,
GOminus.MF, MF.ran.elim, MF.ran.weight01, MF.ran.lea, MF.ran.sigTerms,
GOdata.CC, CC.reg.elim, CC.reg.weight01, CC.reg.lea, CC.reg.sigTerms,
GOminus.CC, CC.ran.elim, CC.ran.weight01, CC.ran.lea, CC.ran.sigTerms,
file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Rgraphviz_2.30.0 variancePartition_1.16.1 scales_1.1.0
## [4] foreach_1.5.0 ggplot2_3.3.0 limma_3.42.2
## [7] knitr_1.28 topGO_2.38.1 SparseM_1.78
## [10] GO.db_3.10.0 AnnotationDbi_1.48.0 IRanges_2.20.2
## [13] S4Vectors_0.24.4 Biobase_2.46.0 graph_1.64.0
## [16] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] bit64_0.9-7 splines_3.6.3 gtools_3.8.2
## [4] assertthat_0.2.1 statmod_1.4.34 highr_0.8
## [7] blob_1.2.1 yaml_2.2.1 progress_1.2.2
## [10] pillar_1.4.3 RSQLite_2.2.0 lattice_0.20-41
## [13] glue_1.4.0 digest_0.6.25 minqa_1.2.4
## [16] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-18
## [19] plyr_1.8.6 pkgconfig_2.0.3 purrr_0.3.4
## [22] gdata_2.18.0 BiocParallel_1.20.1 lme4_1.1-23
## [25] tibble_3.0.1 ellipsis_0.3.0 withr_2.2.0
## [28] pbkrtest_0.4-8.6 magrittr_1.5 crayon_1.3.4
## [31] memoise_1.1.0 evaluate_0.14 doParallel_1.0.15
## [34] nlme_3.1-147 MASS_7.3-51.6 gplots_3.0.3
## [37] tools_3.6.3 prettyunits_1.1.1 hms_0.5.3
## [40] lifecycle_0.2.0 matrixStats_0.56.0 stringr_1.4.0
## [43] munsell_0.5.0 colorRamps_2.3 compiler_3.6.3
## [46] caTools_1.18.0 rlang_0.4.6 nloptr_1.2.2.1
## [49] iterators_1.0.12 bitops_1.0-6 rmarkdown_2.1
## [52] boot_1.3-25 gtable_0.3.0 codetools_0.2-16
## [55] DBI_1.1.0 reshape2_1.4.4 R6_2.4.1
## [58] dplyr_0.8.5 bit_1.1-15.2 KernSmooth_2.23-17
## [61] stringi_1.4.6 Rcpp_1.0.4.6 vctrs_0.2.4
## [64] tidyselect_1.0.0 xfun_0.13
Alexa, Adrian, and Jorg Rahnenfuhrer. 2019. TopGO: Enrichment Analysis for Gene Ontology.